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FOREWORD 

This report describes part of a comprehensive and continuing program of 
research in multi spectral remote sensing of environment from aircraft and satel- 
lites. The research is being carried out for the NASA Lyndon B, Johnson Space 
Center, Houston, Texas, by the Environmental Research Institute of Michigan 
(formerly the Willow Run Laboratories, a unit of The University of Michigan’s 
Institute of Science and Technology). The basic objective of this program is to 
develop remote sensing as a practical tool for obtaining extensive environmental 
information quickly and economically. 

In recent times, many new applications of multispectral sensing have come 
into being. These include agricultural census -taking, detection of . diseased plants, 
urban land studies, measurement of water depth, studies of air and water pollution, 
and general assessment of land-use patterns. Yet the techniques employed remain 
limited by the resolution capability of a multispectral scanner. Techniques de- 
scribed in this report may help to overcome this limitation by enabling either ex- 
amination of the contents of a given scanner resolution cell or faster estimation 
of the contents of a larger area. 

To date, our work on estimation of proportions has included: (1) extension 
of the signature concept to a mixture of objects; (2) development of a statistical 
and geometric model for sets and mixtures of signatures; (3) evaluation of compu- 
tational methods used to estimate proportions of a mixture by maximum likelihood; 
(4) measurement of computer time required by various algorithms; (5) creation of 
a computational technique for assessing the expected accuracy of estimation as a 
function of the signature set; (6) development of techniques to identify alien objects; 
(7) testing and evaluating the proportion estimation algorithms on artificial as well 
as actual multispectral scanner data; and (8) examining the problem of establishing 
signatures when pure samples of the objects of interest are not available. 

The research covered in this report was performed under Contract NAS9-9784, 
Task B2.9, and covers the period from November 1971 through January 1973. Dr. 
Andrew Potter has been Technical Monitor. The program was directed by R. R. Legault, 
Associate Director of the Environmental Research Institute of Michigan (ERIM), 
and by J. D. Erickson, Principal Investigator and Head of the ERIM Multispectral 
Analysis Section. The ERIM number for this report is 3 1650 -14 8- T. 
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ABSTRACT 

The need for multispectral data processing methods to permit the estimation 
of proportions of objects and materials appearing within the instantaneous field of 
view of a scanning system is discussed. An algorithm developed for proportion 
estimation is described as well as other supporting processing techniques. Appli- 
cation of this algorithm to space -simulated multispectral scanner data is discussed 
and some results presented and compared. Results reported herein indicate that, 
for this data set, the true proportions of the various crops contained within this 
data set are with one exception more closely in agreement with the proportions 
determined by the proportion estimation algorithm than with the proportions 
determined by conventional classification algorithm. 
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ESTIMATING CROP ACREAGE FROM 
SPACE-SIMULATED MULTISPECTR AL SCANNER DATA 

1 

SUMMARY 

There are many potential applications for multispectral remote sensing. Under certain 
circumstances, however, the applications are limited by the spatial resolution of the sensing 
device. To help overcome this limitation, techniques have been developed for estimating the 
proportions of the objects contained within each pixel or resolution element of a multispectral 
scanner. Such techniques were tested on artificial data in 1971 [1] , 

The investigations reported herein had three objectives: (1) to develop additional supporting 
techniques for application of the ERIM proportion estimation algorithm, (2) to test this algorithm 
on actual multispectral scanner data, and (3) to demonstrate the superiority of proportion estima- 
tion over standard classification for estimating crop acreage or area from space-altitude multi- 
spectral scanner data* 

As a part of objective (1), a statistical test was developed to detect resolution elements 
containing alien or unknown material. This test, based on the geometry of the signature simplex, 
helps eliminate from consideration data which would otherwise corrupt estimates of proportions. 

Some work was done on estimating signature means and covariances from data in which 
no pure samples are to be had for training purposes. This capability is important because of 
the difficulty in finding pure training sets in low resolution data. A mathematical procedure was 
proposed for such estimation. However, in view of possible data handling difficulties, this problem 
merits further investigation. 

To satisfy objectives (2) and (3) above, multispectral scanner data were obtained from a 
flight made at 10,000 ft over an agricultural area in Lenawee County, Michigan. The data were 
digitized and various signatures generated from a number of training areas that provided pure 
data samples of six major crops. Various subsets of these signatures were evaluated by using 
a geometric signature analysis technique; then for each crop, a subset of signatures was com- 
bined to yield a final crop signature. Simulated space data were generated by smoothing (i.e., 
data point averaging) the 10,000 ft data over the scan line and along the flight line to simulate 
the 300 ft by 300 ft spatial resolution expected from the ERTS and SKYLAB multispectral scan- 
ners. The signatures generated from the unsmoothed data were used to estimate proportions 
in the space-simulated data via the method of Theil and van de Panne [2] . Proportion maps 
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and an alien object map were produced for the entire Lenawee County flight line. Classification 
maps were also generated using the conventional, one-class-per -pixel approach. 

These results were analyzed and compared with the ground truth Information available 
for this flight line. Analysis indicated that, for those test areas where adequate and reliable 
ground truth is available, acreages determined by proportion estimation were closer to the truth 
than those determined by conventional classification procedures. 
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2 

INTRODUCTION 

In recent years, remote multispectral data collection and automatic processing techniques 
have proven feasible for many user applications in the fields of earth resource survey and 
management. For some applications, however, the spatial resolution limits of multispectral 
scanners restrict the usefulness of data so obtained* 

Examples of this constraint are seen in agricultural resource management where it is 
important to know the number of acres planted to each of the major crops. Spaceborne multi- 
spectral scanning systems and automatic data processing schemes have been recommended as 
efficient means for providing the necessary information. However, the information that can be 
extracted via conventional processing techniques may not be sufficiently accurate, since the 
instantaneous ground patch viewed from space altitudes by the scanner comprises a significant 
fraction of typical agricultural fields- As a result many pixels overlap the common boundaries 
between adjoining fields. 

To roughly determine the number of pixels which may reside on a field boundary, let us 
consider the idealized scene of Fig. 1. Here we see nine fields, as bounded by the solid lines, 
and a superimposed matrix of squares formed by the dashed lines with each such square defin- 
ing the ground area of a single pixel. The illustration includes only enough pixels to cover the 
center field in the scene. Clearly, many pixel areas overlap the boundaries of the center field. 
For the case illustrated, 12 of the 16 pixels cover areas lying both inside and outside the center 
field; of the total center field area, some 55% is included in these 12 pixels. 

When more than one ground cover is viewed in any given pixel, the apparent reflectance 
spectrum is modified. Figures 2 and 3 illustrate this. In Fig, 2 the reflectance spectra are 
depicted as they would appear individually for corn and bare soil. But if the sensor were to 
view corn and bare soil simultaneously, the effective reflectance spectrum would be quite dif- 
ferent. This is shown in Fig. 3 for the combinations 20% corn/80% bare soil and 50% corn/50% 
bare soil where the spectra are simply weighted combinations of the pure spectra illustrated 
in Fig. 2. 

Hence it is clear that the spectra generated by a combination or mixture of two or more 
objects will differ, sometimes considerably, from the pure spectra. The resulting spectra 
will not be characteristic of either object class. Thus those pixels in Fig* 1 which overlap the 
field boundaries would most likely be mi sc lass if led; and as a consequence the area covered by 
the crop of the center field would be underestimated by 55%, assuming that all four pixels 
wholly within the field were correctly classified. 
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We have carried out some calculations to roughly determine how serious this problem may be 
for spaceborne sensors viewing an instantaneous ground patch 300 ft square. Figure 4, which 
plots the results of these calculations, illustrates the effect of field area and shape on both the 
multispectral training and classification operations. This time we include rectangular fields 
as well as square ones. Dimensions of both the square and rectangular fields are integral 
multiples of 300 ft, and those pixels which overlap field boundaries are assumed to fall one- 
half or, at field corners, three-quarters outside the field. For the rectangular fields the small 
dimension is 600 ft. Square and rectangular fields of this sort represent the limiting conditions 
(best and worst) for regularly shaped areas and for the pixel arrangement described. 

Two pairs of curves are plotted in Fig. 4. Those drawn as dashed lines relate the number 
of acres in a field to the number of 300 by 300 ft elements wholly within the field. This rela- 
tionship becomes significant in training a recognition computer when the characteristic signa- 
tures of objects to be identified are being established. For each signature some minimum num- 
ber of data points is required for adequate determination of signature statistics. (This is 
discussed in more detail in Appendix I.) As an example of the use of Fig, 4, assume that 40 
samples are required. Using the dashed lines, we find that for 40 samples to be entirely within 
the boundaries of a single field, a square field of at least 110 acres would be required and a 
2 X n rectangular field of at least 170 acres. These field sizes are relatively large and fields 
of this size may not exist for all crops in a scene. This suggests the difficulties that are likely 
to arise in the training operation alone. 

If training of the computer had already been accomplished, however, it would be desirable 
to know what errors might be expected when automatically determining the area covered by a 
selected set of object classes. Figure 4 shows these errors. The curves drawn as solid lines 
relate the number of 300 by 300 ft elements in the field to the percentage of the field area which 
is seen in combination with portions of adjoining fields. As previously mentioned, it is those 
pixels that overlap the field boundaries which would probably be classified incorrectly and thus 
produce the errors in computed acreage. For the 110-acre square field described above, 40 
elements were totally within the field so a 25% error in the determination of the area of that 
field could be expected. For the 170-acre rectangular field a 51% error in field area could be 
expected. Because of the generally limited size farms in the Midwest, the fields are often 
rectangular in shape to permit more efficient planting and harvesting. Therefore the expected 
errors for the assumed pixel arrangement may well exceed the minimum value shown. 

Not only are these errors of significant size, but in many regions of the country the field 
sizes considered above are somewhat larger than is typical. As shown in Fig. 4, even larger 
errors would result for smaller fields. 
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If, as we have illustrated here, the percentage of pixels containing class boundaries is large, 
then a realistic estimation of acreage from low resolution scanner data may require a determina- 
tion of the proportions of classes contained within the boundary pixels. The errors due to over^ 
lap might be reduced by increasing the system resolution, but this is not always feasible. It 
would seem easier to avoid this problem by estimating the proportions of the several materials 
in each pixel. 

Toward developing suitable algorithms for estimating proportions in a mixture, considerable 
previous work has already been done [1, 3] . A simple Gaussian model was constructed for re- 
lating the multispectral signature of a mixture to the signatures of component materials. With 
this model, several existing computational methods were adapted for estimating the proportions 
by maximum likelihood. Special emphasis was placed on a method that assumes equal covariance 
matrices, since this simplifies the optimization problem and since this assumption can be justi- 
fied practically and theoretically (Ref. 3, Section 3.4). Artificial scanner data were generated to 
test the computational methods for accuracy and speed. We found it possible to estimate pro- 
portions over an entire rectangular area directly from the average of data vectors from that 
area, with little or no decrease in accuracy and considerable increase to speed. Proportions 
estimated with and without averaging were compared. Also, a geometric criterion was de- 
veloped to determine in advance whether good estimates could be obtained from a given set of 
signatures. 

The present report, summarizing work accomplished to date, attempts to show the use- 
fulness of the proportion-estimation method in a typical remote sensing application. Because 
the measurement of agricultural resources is a common application, an agricultural test area 
was chosen. And because we anticipated that estimation of proportions will be a necessary 
part of ERTS or SKYLAB measurement of earth resources, the estimation program was 
tested on simulated space data. 

The next section (Section 3) describes the model used for mixture signatures and sum- 
marizes the computational methods required to solve the proportion-estimation problem. It 
further discusses the problem of detecting alien objects — that is, objects in the scene not 
represented by any of the given signatures. Statistical and geometrical criteria have been 
developed for detecting and eliminating data points that probably do not represent mixtures of 
the given materials (and hence could corrupt the estiuiated proportions). 

Section 4 describes the preparation of a data set, the application of the proportion estima- 
' tion to the data set, and an analysis of the results. 
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Section 5 deals with estimation of signature means and covariances from mixtures data. 
This type of estimation may be essential in obtaining signatures from space data in which very 
few data points may correspond to ^’pure’^ crops or surface compositions. 
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3 

DEVELOPMENT OF THE PROPORTION ESTIMATION CAPABILITY 

In this section our model for signatures is described and the mathematical procedures for 
estimation of proportions are defined. We describe the model and solution of the estimation 
problem by maximum likelihood* We then discuss the quadratic programming solution to the 
optimization problem derived from the maximum likelihood criterion, and also some of the 
algorithms considered. Finally we discuss the alien object problem and explain our method of 
geometric signature analysis, 

3,1, MODEL FOR SIGNATURES OF MIXTURES 

If the IFOV of a multispectral scanner is large compared to the structure of the scene being 
scanned, a single resolution cell or pixel may contain more than a single object or material. 
Suppose the scanner has n spectral channels and that the signature of object class i, where 
1 < i < m, is represented by an n -dimensional Gaussian distribution with mean and covari- 
ance matrix M^. If the proportion of object class i in the resolution cell is p^, and p is the 
vector (p^, • ♦ • * , then let the mixture signature for this combination of classes have 

mean vector A and covariance matrix M_, 

P P 

To find expressions for A and M , consider the following model. If the resolution cell 

jP P 

contains elements only of object class i, assume that it contains N, elements. With each ele- 
ment, associate a random variable with mean A* and covariance matrix M*, We then have 

A. = N.Af 

I 1 I 

If we assume statistical independence for these R random variables, then we also have 

M. = N.Mt 
111 

Now, if the proportion of the resolution cell covered by elements of object class i is p., then 
the number of elements of this type in the resolution ceil is Thus 

A = Vp.N.A'!' = Vp.A. 
i 

If we assume statistical independence for random variables associated with elements from 
different object classes, we also obtain 

M = p,N,M? = y p.M. 

p I 1 1 

i ^ 
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Sln.c6 th8 pur6 signatures of objects i are taken to be Gaussian distributions, the distribution 
associated with the proportion vector p is also Gaussian and may be defined by the mean and 
covariance, 

A = y]p.A. 

p 1 

u = y p.M, 

p 1 

These formulas constitute our model for signatures of object -class combinations (mixtures) 
in terms of signatures of the individual object classes. 


3.2. ESTIMATION OF PROPORTIONS 

The model for a mixture signature can now be used to estimate the proportion vector p 
for a mixture of materials in the resolution cell corresponding to a signal data vector from a 
multispectral scanner. This is done by minimizing the negative log of the likelihood function 
associated with the Gaussian distribution having mean A^ and covariance M^. Let A. be the 
signature mean vector for the i-th object class, and M. the corresponding variance matrix. 
Let y be the n -dimensional data vector from the scanner. Then the proportion vector p is 
estimated by finding a value for p that minimizes 

F(p) = fnlMpl + - Ap, Mp^(y - 

subject to the constraint that 




= 1 and pj^ 


> 0 for 1 < i ^ m 


where A and M are as defined above. Note that 1 m| is the determinant of M, M ^ is its 
P P 

matrix inverse, and <u, v> denotes the inner product of vectors u and v. F(p) is the negative 
natural log of the Gaussian density function with the constant term eliminated and with mean 


A and M evaluated at the point y. 
P P 


In general, minimizing F(p) subject to the given constraints is quite difficult, but it can be 
shown that the minimization problem is easier if the are equal. Also, it can be shown that the 
optimal p is unique whenever the covariance matrices M. are equal. Previous work with simu- 
lated data (Ref. 3) has shown that the equal covariance assumption leads to a good approximation 
to the true proportions, even in the cases where the M. are not equal. For these reasons our 
previous work emphasized the case where the are equal, and our present estimation algorithm 
also makes this assumption. 
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3,2,1. SOLUTION BY QUADRATIC PROGRAMMING 

Suppose now that all covariances are equal, so that M. - M for all i. Then the function 
F(p) becomes 

F(p) = «nlMl + <^y - Ap, M'^(y - Ap)^ 

The first term drops out of the optimization problem, since it is constant. If M is factored 
into the form 
T 

LL =M 

then 

<^y - Ap, M'^y - ^p)^ = <^y - Ap, L'^^L'^Cy - Ap)^ = <^L ^y - Ap), L ^(y - Ap)^ 
Thus minimizing F(p) is equivalent to minimizing 
G(p)= IIZ -Bplt^ 

where Z = L~^y and B = L”^A . This is geometrically equivalent to projecting B of Z onto 

P ^ -1 , 

the convex hull of the points B. = L A. . With the constraints on p, the problem becomes one 
of quadratic programming and can be solved by any of several known methods. 

In our previous work, three algorithms were considered for solving the quadratic pro- 
gramming problem: the Frank and Wolfe, Complementary Pivot, and Theil and van de Panne 
methods. A brief description is given below for the Theil and van de Panne method, which is 
the algorithm used in our present program. For descriptions of the other two methods, see 
Refs. 1 and 3. 

The Thailand van de Panne method minimizes G(p) by computing certain projections of Z 
onto hyperplanes determined by vertices (Usually one does not have to compute all such 
projections.) Let S be a subset of the index set (1, . . . , m) and let 7Tg be the orthogonal pro- 
jection of Z onto the hyperplane spanned by the B. such that Then there exists a propor- 

S ^ 

tion vector p which satisfies 

*s«> = IjS 

1 

g 

where = 0 for ieS 

If is optimal, it can be shown that there is a subset of the index set such that 

cO 

= p. for all i 

Thus can be obtained by projecting Z onto the hyperplane spanned by the i^S for some S. 
This method also permits a reduction in the number of projections required to compute the 

optimum. (For more detail, see the appendices of Refs. 1 and 3.) 
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Note that the problem of estimating p when the are equal can be expressed entirely in 
geometrical terms. The vectors A. can be regarded as points in n-space. The set of all possi- 
ble mixture signature means Jj.A., where p = (p^, • • • , P^) and p is a proportion vector, is 
the convex hull of the and is called the signature simplex. The covariance matrices M. can 
be regarded geometrically as hyperellipsoids centered at the respective vertices A. of the sim- 
plex. Similarly the vectors B. = L ^A. are points in n-space and form a geometrical figure 
caUed the transformed signature simplex, which is the convex hull of the B. . The transformed 
covariance matrices are all -identity and represented by unit spheres centered around the B.. 

The point B corresponding to the optimal p is the closest point in the transformed simplex to 
the point Z = l'V- The convex hull of the will be the same as the geometrical figure whose 
edges connect the in space if the (n + 1) -dimensional vectors U, B^) are linearly independent. 

3.3. DETECTION OF ALIEN OBJECTS 

The maximum likelihood estimate of p obtained by the methods described in previous text 
may be greatly in error if the area contains alien objects not represented in the signature set. 
(Objects not represented in the signature set are called alien objects.) To assure fairly accu- 
rate estimates of the proportions, it is essential that pixels which include large portions of alien 
material be eliminated from consideration. Statistical tests have been developed for detecting 
such points, using the geometric concept of the signature simplex. 

It is not possible, without additional Information, to estimate the proportions of unspecified 
materials in a mixture. Nor is it possible to determine unquestionably whether a data point 
represents a mixture of the specified materials. This can be done only in a statistical sense. 

If a data point does represent such a mixture, we would expect it to lie fairly close to the signa- 
ture simplex; just how close will depend on some statistical threshold. 

Some alien objects are easier to detect than others. Overall ease of detection depends on 
(1) the actual proportion of the alien material present and (2) differences in spectr^a between 
the alien materials and the materials specified for the mixture. This dependence is true no 
matter what statistical threshold is used. 

We will now briefly discuss the manner in which alien objects are detected. (Appendix I 
contains more details on the implementation of the alien object tests). The approach taken can 
be explained in purely geometric terms. The locus of mixture signature means is the signature 
simplex. Indeed, if there were no statistical variation in the data, then every data point repre- 
senting a mixture of the specified materials would lie in the simplex. With statistical variation, 
every such data point will lie ’’close” to the simplex. This is the key idea. 
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For the sake of mathematical simplicity, the analysis that follows is based entirely on the 
transformed simplex and data point Z. However, the alien object test computations carried 
out as a part of the proportion estimation program employ only the untransformed data vectors* 

The alien object test is divided into two parts. The first part, the hyperplane criterion, 
deals with the projection of Z onto the linear variety determined by the signature simplex. The 
second part, the out -of -plane test, tests the magnitude of the component of Z lying outside this 
linear variety. Both are used to determine whether Z is ”too far’^ from the signature simplex. 

3.3.1* HYPERPLANE CRITERION 

Let y be the data vector, Z = L ^y, and let B. be the vertices of the transformed simplex* 
Since we are dealing with the projection of Z onto the space spanned by the B., we can assume 
without loss of generality that Z lies within this space* The hyperplane criterion will deter- 
mine whether a given point Z lies within s ’’standard deviations” of the locus of mixture signa- 
ture means. The locus of mixture signature means is the signature simplex. If this is the simplex 
of transformed means, the locus of points within s standard deviations of any mixture signature 
mean is approximated by a simplex similar to but larger than the signature simplex. Its faces are 
parallel to the faces of the signature simplex and a distance s units from them, respectively. 
This is the ’’extended signature simplex s units away” which is shown in Fig, 5 for two dimen- 
sions and three signatures. The vertices (B^, and B^) of the solid-line triangle represent 
the signature means while the triangle itself describes the locus of mixture signature means. 

The dashed -line triangle is the extended signature simplex. If the transformed data point Z is 
outside the extended simplex with s > 2, there is a low probability that y represents a mixture 
of only the elements whose means form the signature simplex. Thus the corresponding pixel 
likely contains some alien materials so proportions should not be estimated for that pixel. The 
point Z lies within the extended simplex if and only if it is on the same side of each face 
(hyperplane) as the corresponding opposite vertex of the signature simplex. By using the hyper - 
planes through the faces, it is easy to determine whether Z is inside or outside. If some pre- 
liminary computations are first carried out, this test is easy to implement. 

3*3.2. OUT -OF -PLANE TEST 

The hyperplane criterion, in actual fact, tests only the projection of Z onto the hyperplane 
determined by the vertices B,. But it is still possible, even if Z satisfies that criterion, for Z 
itself to be far away from the signature simplex, indicating that Z probably represents a mixture 
containing much alien material. In such case the hyperplane criterion is not an adequate test. 
Therefore it is desirable to test those components of Z which are orthogonal to the hyperplane 
determined by the B.. Such a test has been formulated and is called the out -of -plane test. 

(See Appendix I for more details.) 
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3*3.3. ALIEN OBJECT TEST SUMMARY 

To summarize: The hyperplane criterion determines whether the in^plane projection of Z 
is more than s standard deviations from the simplex. The out -of -plane criterion tests whether 
any out -of -plane components of Z are more than s V0 away from the simplex, where 0 is the 
”out -of -plane factor” specified in the control input. The number <t> is specified by the user and 
provides a certain flexibility in testing for alien objects- 

The alien object test, if s and <p are properly chosen, can be very useful in eliminating data 
points which will corrupt estimates of the proportions. However, like any other statistical test, 
its validity depends on choice of these threshold values. For the experiment described in this 
report, s and (j> were chosen empirically in order to reject a fixed percentage of the training set 
data points. 

The computations required to implement both the hyperplane criterion and the out -of -plane 
test have been much simplified in the interest of minimizing the amount of computer time per 
data point- Most of the computation is done once for each rectangular area in a subroutine of 
the proportion estimation program. The point -by -point computation then consists of two matrix 
multiplications plus a few other simple operations. On the Control Data C1604 digital computer 
the processing time required is 0.0167 sec/point. 

3.4. DESCRIPTION OF THE PROPORTION ESTIMATION PROGRAM 

By using the procedures described in the previous two sections, a program named MIXMAP 
has been written to estimate the proportions of objects from multispectral data. Inputs consist 
of (1) control information, (2) the required signatures, and (3) the data from which proportions 
are to be estimated. Outputs consist of (1) estimated proportions and number of alien objects 
detected in the area of interest, and (2) a series of digital maps displaying the approximate 
proportions of each crop in each resolution cell and a map of the alien objects detected. 

The MIXMAP program affords the user a number of options, each specified to the com- 
puter by the control input. One option is to estimate the proportions ”with averaging” over each 
rectangular area specified to the program. Normally, proportions are estimated separately 
from each resolution element or data point within an area and then averaged to obtain estimated 
proportions for the entire area. In application of the averaging option, however, MIXMAP will 
first average the data vectors and then estimate proportions for the entire area from the aver- 
aged data vector, (Reference 3 discusses this technique.) 

Another option is the use of ”covariance factors,” Under this option, scaling of the covariance 
matrices is made possible, should the user desire to do so. If this option is not used the co- 
variance factors are all set to unity. 

MIXMAP is capable of testing for and eliminating alien objects from consideration. This 
also is a processing option. When the option is exercised, two numbers are required to 


17 





FORMERLY WILLOW RUN LABORATORIES. THE UNIVERSITY OF MICHIGAN 


Specify the threshold (statistical) level of the test: One is the number s of standard deviations 
of tolerance; the other is an '’out -of -plane factor” (f> . In this option every data point is tested 
individually, whether the averaging option is used or not. 

The control input, in addition to its specification of options, must include: (1) number of 
pure signatures, (2) number of spectral channels used in defining the signatures, (3) subset of 
channels to be used in estimating proportions, (4) names of the pure signatures, and (5) the 
standard data -processing inputs (number of files to skip on the data tape, and the line and point 
limits for each rectangular area to be processed)* 

When the averaging option is not used, MIXMAP will provide a series of maps showing the 
location of the alien objects detected and, for each pixel, the approximate proportions of each 
material for which a pure signature is provided. 

Figure 6 shows examples of the maps generated when carrying out standard (one class for 
one point) classification and proportion estimations of each class for each point. The standard 
classification map appears in the center of the group of figures. Here a distinct symbol is 
assigned to each class; then as each point or resolution element is classified, the correspond- 
ing symbol is printed in the appropriate location. For this map the classes corn, soybeans, 
bare soil, alfalfa, cut alfalfa, and alien are represented by the symbols 0, X, X, =, and blank, 
respectively. When estimating proportions one map is generated for each class with symbols 
now assigned to represent the proportion of that class at each point. For purposes of this 
illustration the proportion ranges of 0-20%, 20-60%, and 60-100% are represented by blank, 

* , and 1^1, respectively. Of course, as many ranges from 0-100% may be designated and mapped 
as there are distinct symbols. 

Only the alien object map differs from what has just been described* For this map pro- 
portions are not used and all points to be designated as alien are assigned a particular symbol. 
An alternate method is to assign several distinct symbols, each specifying a range of ’'distance’' 
of the point in question from the vector space spanned by the signatures. 

3.5. SIGNATURE ANALYSIS 

In our previous work we developed a technique for determining whether good estimates of 
proportions could be obtained with a given set of signatures. This is an important considera- 
tion for, with some signature sets (as for example, where one material is spectrally similar to 
a mixture of the others), the estimates will be poor at best. Whether good estimates can be ob- 
tained depends on the shape of the signature simplex. To minimize the error in distinguishing 
between pure objects with means A., the distances between the signature means A. should be 
large relative to the spreads (covariances)* This requirement applies also to the mixtures 
problem* But to obtain good estimates of proportions still another condition must be satisfied: 
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namely, that no vertex of the signature simplex is "too close" to the convex hull of the other 
vertices. If this last condition is not satisfied, one object might be improperly Interpreted as 
a mixture of other objects. The likelihood of such occurrence can be foreseen by geometric 
signature analysis. 

The key to such analysis is that the signatures and the relationships between them can be 
viewed entirely in geometrical terras. The signature simplex has already been defined (see 
Section 3.3.1) as the geometrical figure in signal space whose edges connect pairs of pure 
signature means. In the nondegenerate case, the one of interest here, each pure signature 
comprises a distinct vertex of this simplex. The covariance matrices can be interpreted in 
terms of loci of constant probability. For a Gaussian distribution these loci are hyperellipsoids, 
centered about the signature mean. One of these ellipsoids, called the unit contour ellipsoid, is 
chosen for each signature. These contour ellipsoids can be depicted with the signature simplex. 
A specific geometrical configuration for three signatures and two channels is shown in Fig. 7. 

In general, if A. is the signature mean and M. is the covariance matrix, the unit contour ellip- 
soid is the set of vectors v satisfying 

<^v - A., M'^(v - Aj)^ = 1 

Thus the geometrical configuration is completely determined by the A. and M^. 

Inaccuracies in the estimate of the proportion vector are likely if any vertex of the signa- 
ture simplex is too close to the opposite face (convex hull of the remaining vertices). What is 
”too close” will obviously depend on the size and shape of the unit contour ellipsoid about that 
vertex and hence on the corresponding covariance matrix. Figure 7 depicts a configuration in 
which the vertices are well separated, in a probability sense, from their opposite faces. Fig- 
ure 8 on the other hand, illustrates an ill-conditioned configuration in which some points on the 
face (segment) are close in a probability sense to the vertex A^. 

Geometrical signature analysis consists of computing a relative distance r^^ from each 
vertex A^ of the signature simplex to its opposite face. In general, r. is a measure of how far 
one signature is from the convex hull of the others, and may be regarded as a probability dis- 
tance. If all the r. are large, estimates of the proportion vector should be reliable. On the 
other hand, if any is small, significant estimation errors may result. The number r. can be 
obtained in three steps: 

(1) Find a matrix P such that the i-th contour ellipsoid transformed by P ^ is a unit 

circle about A. . 

I 

(2) Transform all the A. by P . 

(3) Using the simplex formed by the transformed A^, let r, be the minimum distance from 
A^ to its opposite face. 
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FIGURE 7. SIGNATURE SIMPLEX WITH UNIT CONTOUR ELLIPSOIDS 



FIGURE 8. ILL-CONDITIONED SIGNATURE SIMPLEX 
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More specifically, this involves the following computational steps: 

T 

(1) Compute by Cholesky decomposition a matrix such that PP = M.. 

(2) Compute the transformed signature mean vectors B. = P A.. 

(3) Let C. be the closest point to B. in the convex hull of the other B.. 
puted by the Theil and van de Panne method.) 

(4) Compute r, = 1 1 A. - C J I . 


(This may be com- 
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APPLICATION OF THE PROPORTION ESTIMATION CAPABILITY 

To assess the usefulness of the techniques described in Section 3, a set of multlspectral 
scanner data gathered from aircraft altitudes was selected for testing. (Data gathered from 
space altitudes would have been preferred, but none was available.) The data set selected was 
gathered by The University of Michigan multispectral scanner on 21 August 1970 at a flight 
altitude of 10,000 ft over an agricultural area in Lenawee County, Michigan. Twelve channels 
of data were recorded with the spectral sensitivities noted in Table 1. Ground truth data had 
been secured in conjunction with this flight [4] . 

The remainder of this section describes the preparation and processing of this data and 
discusses the results achieved in applying the proportion estimation techniques. 

4.1. DATA PREPARATION 

For proper interpretation of the processing results obtained, the various steps employed 
in preparing and processing the data should be understood. In this section we detail those nec- 
essary steps. 

Because the data were initially recorded in analog form, the first steps consisted of deskew- 
ing and digitizing the data. Deskewing, an operation that eliminates any channel-to-channel mis- 
registration, is accomplished prior to digitizing by utilizing electronic delay lines to bring the 
data into register. (Even though the spectrometer channels are aligned optically, the skew 
problem still exists because of slight differences generally present in the tape recorder record 
and playback head alignment.) 

In digitizing the data we took advantage of certain data collection characteristics to reduce 
noise effects. From an altitude of 10,000 ft, successive scan lines overlapped a considerable 
amount. Therefore, rather than digitizing only selected scan lines and sampling those lines at 
a rate producing a density equivalent to the system optical resolution of 3.3 mr (milliradians), 
we sampled at an equivalent 5 mr rate using the appropriate electronic response to smooth the 
data in the scan direction, and averaged several of the scan lines to cover the same ground area 
which would be covered by a scanner of 5 mr optical resolution. 

The next steps in data preparation were to clamp and scale the data. The purpose of both 
of these operations was to reduce, as much as possible, variations in the data caused by changes 
in collection and recording system characteristics during the data collection flight. One of these 
changes might be a change in the recorder offset value. Changes in offset were accounted for by 
clamping the data to the signal generated during each scan line as the scanner views the dark 
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TABLE 1. SPECTRAL SENSITIVITY OF THE 
MICHIGAN MULTISPECTRAL SCANNER FOR AUGUST 1970 


Spectrometer Channel 
Number 

50% Response 
Points 
(fitn) 

Peak Response 
Points 
(^m) 

1 

0.41-0.43 

0.42 

2 

0.43-0.455 

0.44 

3 

0.455-0.47 

0.46 

4 

0.47-0.485 

0.475 

5 

0.485-0.500 

0.49 

6 

0.500-0.520 

0.51 


7 

0.520-0.545 

0.535 

8 

0.545-0.580 

0.56 

9 

0.580-0,63 

0.605 

10 

0.63-0.68 

0.65 

11 

0.68-0.74 

0.71 

12 

0.75-0.855 

0.805 
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interior of the scanner. Changes in system gain may also occur; these are compensated by 
using, as an automatic gain control, the signal generated when the scanner views the fixed 
radiance of a reference source. 

Upon our completion of these preparatory steps the data were examined to see whether 
any systematic variations remained. Not too surprisingly, there was a systematic variation 
in the average scanner signal level as a function of scanner view angle. This effect probably 
resulted from two factors: the varying effects of the atmosphere as the optical path length 
from the sensor to the ground varied with changing view angles, and the bidirectional reflectance 
characteristics of the crops. Since variable effects of this sort would be much reduced in the 
satellite -based scanners with their smaller total field of view, the systematic angular varia- 
tions in the Lenawee County data were eliminated using the ACORN IV preprocessing technique de- 
veloped by ERIM personnel [6-9] . 

The final step in the data preparation phase consisted of generating the space -simulated 
data. This was accomplished by averaging data points in both dimensions (across and along 
the flight path) to produce single pixels or resolution elements measuring 300 ft on a side. 

This approximates the instantaneous ground patch which will be viewed by the multispectral 
scanners carried by the ERTS-1 and SKYLAB satellites. 

4.2. SIGNATURE EXTRACTION 

Before processing the space- simulated data it was necessary to establish the characteristics 
of the signatures for the major crops and ground covers in the scene so that the computer could 
be trained to recognize these objects. 

From the ground truth information it was determined that the major constituents in the 
scene were corn, soybeans, bare soil, grain stubble, alfalfa and cut alfalfa. These comprised 
more than 90% of the scene. In order to locate in the data specific areas containing samples 
of these items, a graymap of one spectral channel was generated. (A graymap is a pictorial 
representation of the scene in which specific ranges of scanner signals are assigned distinct 
printer symbols.) From the graymap it was clear that there would be difficulties in locating 
a sufficient number of pure resolution elements to provide samples of each crop type for the 
establishment of crop signatures. (The number of such samples required is discussed in Ap- 
pendix I; a possible means for extracting pure signatures from data with no pure samples is 
described in Section 5.) The difficulties mentioned above were a result of the relatively low 
spatial resolution in the space- simulated data compared with the dimensions of most fields in 
the scene. This was the case even though Lenawee County has larger fields than most other 
counties in Michigan, 
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A graymap of one spectral channel of the 10,000 ft data was generated and several fields 
of each of the major ground covers were located. Data points were extracted from each of these 
fields and signatures were calculated for each field. The signatures thus obtained were evalu- 
ated via geometric signature analysis (see Section 3.5); based on this analysis, those considered 
to be unrepresentative, samples were eliminated from further consideration. The remaining 
signatures in each crop group were then combined to obtain a final crop signature. The final 
signature set consisted of signatures for corn, soybeans, bare soil, alfalfa, and cut alfalfa. 

4.3. DATA PROCESSING AND EVALUATION 

Before actually processing the set of space -simulated data, we applied the proportion esti- 
mation technique to the training fields in the 10 Kft data in order to estimate the accuracy that 
may be expected in identifying the pure elements of the ^^space data." The results, as given in 
Table 2, show that 60-90% of the data points were correctly classified, with the remainder 
classified either as one of the other four materials or as an alien object. These results are 
similar to those obtained using the conventional classification algorithms on other data sets. 
Therefore, whether we use conventional recognition or the proportion estimation algorithm, we 
can expect comparable accuracy for those data points or pixels containing only one object 
class. 

We shall now discuss the results of estimating proportions from the space -simulated data. 
The entire data set was processed using three methods: (1) the conventional classification ap- 
proach wherein each pixel is classified either as being an alien object or one, and only one, of 
the five classes, (2) the proportion estimation approach in which each pixel is classified as 
being an alien object or a combination of the five classes, and (3) the proportion estimation 
approach with averaging where (a) each pixel is classified as to whether or not it represents 
an alien object and (b) all non-alien object points are averaged and the proportions then esti- 
mated from the single averaged data point. 
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TABLE 2» RESULTS OF ESTIMATING PROPORTIONS 
ON TRAINING AREAS OF lOK FT, AUGUST 
1970 — LENAWEE COUNTY DATA 


Material 

Correctly 

Classified 

(%) 

Corn 

88,0 

Soybeans 

84.8 

Bare Soil 

90,4 

Alfalfa 

80,3 

Cut Alfalfa 

82.0 


Incorrectly 

Classified 

(%) 

Classified 
as Alien 
(%) 

12.0 

2.5 

15.2 

1.9 

9.6 

4.6 

18.7 

1.7 

18.0 

3.4 
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Table 3 summarizes the results of classifying the entire space -simulated data set. Al- 
though ground truth information was not available for the entire scanned scene, we still find 
the results (Table 3) quite interesting. At first glance the results of the three methods look 
so similar that use of any approach more complicated than standard classification algorithms 
might well seem questionable. Certainly the results for corn are essentially the same; but 
as one scans down the table the differences become as great as 14% for cut alfalfa. However 
even this large a difference might not be too important if the truth lay between the two esti-- 
mates. 

We decided to examine in more detail the results for certain areas of the scene where re- 
liable ground truth was available. Some fairly typical findings of this examination are shown 
in Figs. 9 and 10; these findings are discussed and some general conclusions drawn in the fol- 
lowing paragraphs. 

Figure 9 presents the results for an area containing each of the five major classes as well 
as an alien object— namely, a field of grain stubble. As in Fig. 1 the dashed-line squares repre- 
sent individual pixels. In this example all pixels included at least two object classes. Note that 
the field boundaries are skewed with respect to the pixels. This is because the scanner aircraft 
flew at a crab angle of several degrees to correct the flight path for effects of a crosswind. 

On examining the table included as a part of Fig. 9 it is obvious that the results achieved 
in estimating proportions are much more accurate than those using the standard classification 
algorithm. Even though 26% of the area actually contained soybeans the standard algorithm 
recognized no soybeans; at the same time, the amounts of corn and cut alfalfa were grossly 
overestimated. When estimating proportions on the average point, the cut alfalfa was also 
overestimated. No alien elements were identified by any of the methods — possibly because 
the alien object, a grain stubble field, had low weeds and a radiance spectrum resembling that 
of cut alfalfa. Overall, the point-by-point proportion estimates produced the most accurate 
results. 

The area presented in Fig. 10 also includes all five major classes as well as some alien 
objects: the farmstead, the road, and the idle field. Some of the fields were large enough to 
fully contain one or more pixels within their boundaries. The accuracy of the standard classi- 
fication technique was once again quite poor. Soybeans and bare soil were underestimated 
whereas the cut alfalfa was greatly overestimated, A total of nine pixels straddled the bound- 
aries between fields of bare soil and soybeans. Upon examining these pixels we found that 
each was classified as cut alfalfa by the standard classification technique. But when we used 
the proportion estimation algorithm, the alien objects were overestimated. This is under- 
standable since each point was classified as being either totally alien or a combination of the 
other five classes; and if more than two of the elements containing the road were classified 
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Material 

Corn 
Soybeans 
Bare Soil 
Alfalfa 
Cut Alfalfa 
Alien 


TABLE 3. PROPORTIONS OF MATERIALS FOR AUGUST 1970 
LENAWEE COUNTY DATA SET USING THREE CLASSIFICATION 
APPROACHES AFTER COMBINING DATA POINTS TO SIMULATE 
300 X 300 FT RESOLUTION 


Conventional 

Point by Point Classification 
(one class per point) 


Point by Point 
Proportion Estimates 
(as many as five 
classes per point) 


Proportion Estimates 
on a Single Point 
Representing the 
Average of Non-alien 
Points in the Scene 


0,264 

0.129 

0.034 

0.269 

0.249 

0.055 


0,248 

0.253 

0.196 

0.158 

0.101 

0.082 

0.173 

0.207 

0.109 

0.127 

0.173 

0.173 
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STANDARD 

PROPORTION 

PROPORTION 


GROUND 

CLASSIFICATION 

ESTIMATES 

ESTIMATE 

MATERIAL 

TRUTH 

(PT BY PT ) 

(PT BY PT ) 

ON AVG PT 

CORN 

0.262 

0,500 

0,380 

0.347 

SOYBEANS 

0.262 

0.000 

0.199 

0.220 

BARE SOIL 

0.069 

0,000 

0.074 

0.019 

ALFALFA 

0.094 

0.125 

0.091 

0,029 

CUT ALFALFA 

0.194 

0.375 

0.256 

0,386 

ALIEN 

0.119 

0.000 

0.000 

0.000 


FIGURE 9. COMPOSITION OF A PORTION (AREA 1) OF SPACE-SIMULATED LENAWEE 
COUNTY DATA AS DETERMINED BY FOUR METHODS 
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FIGURE 10. COMPOSITION OF A PORTION (AREA 2) OF SPACE-SIMULATED LENAWEE 
COUNTY DATA AS DETERMINED BY FOUR METHODS 
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as alicri) that category would be overestimated. The best overall results were again achieved 
by estimating proportions on a point -by -point basis; estimating proportions from the average 
point was a close second ♦ 

Several other areas were examined with results similar to those already described. Two 
observations seemed to hold true for almost all the areas t the mean SQuare error of the point - 
■py -point proportion estimates was much less than for the standard classification; and the er- 
rors for the standard classification were both positive and negative. Perhaps these traits ex- 
plain how the proportions of materials estimated for the entire scene and listed in Table 3 
could be so similar. Apparently the scene was so structured that the standard technique erred 
almost equally on both sides of the truth. 

In sum, our results indicate that the proportion estimation technique provided a better 
estimate of crop area in those Lenawee County test regions for which reliable ground truth was 
available — somewhat better than the conventional classification algorithm. It would seem that 
proportion estimation would be the better technique for any agricultural area. 
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EXTRACTING SIGNATURES FROM LOW RESOLUTION DATA 

Our experience with the single set of space -simulated data showed that, when operating 
on data gathered at space altitude, some difficulty maybe encountered in obtaining signatures 
in the usual manner since there may not be a sufficient number of pixels containing pure sam- 
ples of the materials to be recognized (see Appendix I), Indeed it is possible that for some 
regions, every pixel will contain a mixture. Hence a method is needed for obtaining signatures 
from data representing mixtures of the specified materials. 

It is desired to compute the signatures from pairs of data vectors y and proportion vectors 
p (obtained from ground truth). A general solution to this problem has not been obtained, but 
we found a mathematical solution for the case where the covariance matrices are equal. This 
solution is described in the following paragraphs. 

Let m be the number of materials or signatures to be obtained, n the number of spectral 
channels, and N the number of samples (pixels) for which multispectral data and proportion 
vectors are available. Assume that N ^ m and that there is a common covariance matrix M 
for the signatures to be estimated. Assume also that the N. samples are statistically indepen- 
dent. Then the m by n matrix A, whose rows are the signature mean vectors, may be estimated 
by least squares. Let 


N 


ki |2 


Q = 2^ llp*^A - y 

k k 

where y is a data vector and p is the corresponding ground truth proportion vector. Also let 
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We may write 
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and minimize each Q. separately. This leads to n least -squares problems, each to estimate 
m quantities A^.. It follows that 



= 2[(p’’'’p)A - p'^Yjy 


Provided that all A. > 0, a necessary condition for the optimal estimate is 


0Q, 


0A.i 


- = 0 for i = 1, . . . , n; j = 1, . . . , m 


T T 

so that (P P)A == P Y. Thus the equation for estimating A is 


A = (P'^P) 

When A has been estimated, it is also possible to estimate the covariance matrix M. Since the 
pure signature covariances are equal, all mixture signature covariances are independent of p . 
Let X be a matrix whose rows X. are data vectors from the m pure signature distributions* 
Then 

m 

cov (p^X) = 2^ Pi cov X. 
i=l 



i=l 


M = 



= M 


Thus M can be estimated by the sample covariance 

The above approach has not been tested but certain difficulties are obvious: suitable esti- 

k k 

mates of A and S can be obtained only if (1) enough data (p , y ) are available and (2) the ground 
truth values p are sufficiently accurate. This stringency may necessitate changes in data 
handling as well as in the procedure for obtaining ground truth. Even for well known areas, it 
will be difficult to locate the boundaries of fields relative to boundaries of individual pixels in 
data gathered from space altitudes. 
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CONCLUSIONS AND RECOMMENDATIONS 

As discussed earlier in this report, there is a need for multispectral data processing 
methods whereby the proportions of objects and materials appearing within the instantaneous 
field of view of a scanning system may be estimated. Toward filling this need, we have de- 
veloped some promising techniques and tested them to determine their utility in estimating 
crop acreage from space. 

The test results were quite encouraging in that our proportion estimation techniques pro- 
duced more accurate crop acreage estimates than did a parallel application of the conventional 
classification technique. Certain problems still exist, however, in applying these new tech- 
niques operationally. 

One of these problems is the computer time required to estimate proportions on a point- 
by-point basis. For an ERTS frame, this time requirement is significantly more than for the 
conventional classification approach. In it's present implementation on the Control Data 1604, more 
than 100 hours will be required for estimating proportions; this maybe compared to 17 hours 
for the conventional approach. The present implementation employs the method of Theil and 
van de Panne. Another method we examined (the Complementary Pivot Method) is faster and 
would be applicable if a problem of numerical instability could be eliminated. 

Some work has also been done toward finding an unconstrained solution to the mixtures 
problem, but to date this approach has produced much less accurate results than we had hoped 
for. 

There is also the problem of obtaining signatures from mixtures data; and, as indicated in 
Section 5, it may be necessary to derive such signatures at space altitudes. A mathematical 
solution to the problem has been presented in Sec. 5, however, this solution may not be practi- 
cal unless certain changes are made in the data handling procedures. The question of optimum 
thresholds in application of the alien object detection criteria remains unanswered at this time. 

In this experiment the alien object threshold numbers s and 0 were determined empiric- 
ally in order to reject about 10% of the training set data points- It would have been better, 
however, to have a definite algorithm for choosing values of s and 0 ; we feel that this merits 
further work. 

Although the problem areas identified above need additional work, we believe that the test 
results reported herein are sufficiently encouraging to justify further tests on satellite multi- 
spectral scanner data. We plan to carry out such tests on ERTS-1 MSS data in the near future. 
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Appendix f 

NUMBER OF SAMPLES REQUIRED FOR TRAINING 

In training a computer to automatically and accurately classify a scene, the signatures used 
must likewise be accurate. And to ensure accurate signatures (means, variance, and covari- 
ances) some minimum number of samples is required. This appendix addresses the number- 
of -samples problem; it is based on an internal memorandum written by R. Crane and 
W. Richardson [5] . 

The derivation of Ref. 5 is based on signal to noise ratio, R. The sample (estimated) covari- 
ance is 


S = (1/N-1)£;(X^ -^(x^ -x)^ 
k-1 


where X = E(X^) and N is the number of samples and the variance of is 
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The percentage errors (E), representing inaccuracy in the covariance and variance terms, 
are 100/R. . and 100/R^ respectively. The upper limit for both of these terms for a fixed N is 
100V2/N This quantity is plotted in Fig. I-l forN ranging from 3 to 1000. It is clear that 
as the number of samples is reduced, the percent inaccuracy increases dramatically. Even for 
what may seem to be a reasonably large number of samples, the inaccuracy is fairly sizable; 
for example, 100 samples still allow an inaccuracy of 14%. Obviously, a large number of train- 
ing samples must be available if the characteristics of the signatures are to be established with 
desired accuracy. 
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Appendix II 

DETAILS OF THE ALIEN OBJECT TESTS 

The alien object tests were briefly described in Section 3.3. In this appendix we provide 
more details. 


Hyperplane Criterion 

Preliminary Computations 

First, some coordinate transformations must be carried out. In order to deal with the 
extended simplex in geometric terms, it is necessary to reduce the dimension of the signature 
mean vectors from n (number of channels) to m-l (number of signatures or vertices less one). 

If the signature simplex is nondegenerate, which we assume), then there is an (m-1) -dimen- 
sional basis whose space includes all the points of the simplex. This basis is a set of n vectors 
. . . , These are obtained as follows 

Let , B be the transformed signature means. 

1 m 

Define C. ^ = B. - B for i = z m where B = (l/m)VB. is the centroid of the simplex. 

1-1 1 c c i ' 

Construct the orthonormal basis D^, . . . , The D. are defined by 

Dj = Cj/llC^II 

D 2 = [-(Dj-C2)Dj +C2]/1I[-(Dj.C2)Dj +C2]II 

Dg = [-(D^-C 3 )Dj -(02*03)02 + C 3 ]/ 11 [-(Dj*C 3 )Dj -(02*03)02 +C3]II 


1 

r k-1 1 / 

1 

k-1 1 

II 



_ i=l J 


Since the D. are a set of mutually orthogonal unit vectors, the can be represented in the new 
basis by the (m-1) -vectors V. defined by 


Vi-(B,.D,,B,I>2 


Second, certain constants must be obtained for testing each data point y. These, of course, 
are related to the equations of the hyperplanes that form the faces of the extended simplex. 
Having worked out these equations, we can formulate a computationally efficient scheme for 
determining whether a point is inside or outside the extended simplex. 
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Assuming nondegeneracy of the original signature simplex, none of its face hyperplanes 
will pass through the origin** Thus the equation of eachk-th hyperplane is of the form = 
when a normal to the hyperplane and x is any point in it. The face hyperplane opposite 
the vertex (or B^) must pass through the vertices V^, - . * , ' • * » 

for an arbitrary scalar factor we have 





^11 

^12 •• 





. 

. 

• 




■ 

• 

* 





\-l,2 • • 

’ ^k-l,m-l 




Vi,i 

\+l,2 ' • 

, V 

^k+l,m-l 


\,m-l 


Jml 

•• 

. V 

n,m-l 


-1 


and the unit normal to the k-th hyperplane is given by == ^ j jj\|j' The hyperplane 
equation is now 

N, X = ^ 


“ iiy 


Now consider the equation of the k-th face hyperplane in the extended simplex s units away. 
Clearly the hyperplane = y^|ja^|| divides the space into two regions 


Rl=^X: N^.X< 


R2= X: Nj^.X> 


1 


ll\ll 

ijy 


If € R then the equation of the k-th face hyperplane of the extended simplex is 

K A 


N *X 
k 


+ s 




Otherwise it is 


K .X = 


" ii^ii 


- s 




♦Note that by construction, the origin of the simplex is at its centroid. 
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The constants and will be used to determine whether the mixture corresponding to a 
point y is likely to contain a significant proportion of unknown materiaL 

Hyperplane Criterion Applied to Each Point 

As noted earlier, a data point or resolution element is to be flagged whenever the point Z 
lies outside the extended simplex. This criterion is easy to define mathematically. A point Z’ 
is inside the extended simplex whenever it is on the same side of the k-'th hyperplane (extended 
simplex) as V^, for all k. Further, Z’ is on the same side of the k-th hyperplane as if and 
only if 

sgn (Nj^- Z' - hj^) = sgn (Nj^- Vj^ - hj^) 

The value of sgn (N^» - h^) is a constant independent of data and can be computed in ad- 

vance of processing. In terms of the observed data point y, 

Z' = DL V (Z’,y column vectors) 
where 



l”^l = m 

and where M is the average covariance matrix of the signatures. If is a row vector, then 

The row Vector (matrix product) ^ can be obtained in advance, and need not be computed 

for every y. It is not necessary to transform y in order to apply the hyperplane criterion. 

Programming Details 

A subroutine in MKMAP does all preliminary computations and computes the matrix 
products ^ and the constants h^. Required inputs are s, m, n (number of channels), and 

the One additional output is 

m 

ISIGN = ^ I(k)2^"^ 
k=l 

where I(k) - 1 if ^ ^ ^ otherwise. To check a data point y, check the sign bit 

of (N^DL ^ - y - h^) against the k-th bit of ISIGN, for every k. If these bits are different for 
any k, the point y is flagged. It is also flagged if I ly “ A^ 1 1 is large. 
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Just as the hyperplane criteriori deals with the projection of Z onto the space spanned by 
the D, , the out -of -plane test deals with the projection of Z onto the orthogonal complement of 
this space. However, the latter test is simpler in that it is concerned only with the magnitudes 
of the components of Z orthogonal to the space spanned by the It is sufficient to complete 
the basis for n -space, where n is the number of spectral channels. 

The dimension of the space spanned by the B. is m-1, where m is the number of signatures. 

The D. form an orthonormal basis for this space. Therefore, the D form part of an ortho - 
i J 

normal basis for n-space. The additional vectors — A^, . . . , ^n-m+1 orthonormal 

basis for the orthogonal complement of the space spanned by the D.. The additional vectors 

can be obtained by a Gram -Schmidt type of process. Let 

mil ^1 

K K+r J j Z-j 3 3 

i=i j=i 

where is a unit vector with H -th component 1 and the others 0, and and are unknown 
coefficients to be obtained. We want the A. to have the property that 

A.- A. = 0 if i j 
1 3 

A.-A. = 1 ifi-j 
1 3 

A.- = 0 for all 3 = 1, . . . , n - m + 1 

] K 

K= 1, . . . , m - 1 


i.e., the D. and A. together form an orthonormal basis for n-space. The number r is initially 
0 but may be incremented by 1 whenever turns out to be a linear combination of , 

D__ A.,, , . . , Aj^ y The desired basis is obtained if 


m-1^ 1 

O!. = -D 
3 


j,K+r 


= -^,K+r 


that is, if the Aj^ are defined by 

m-1 K-1 

K K+r Z-j 3 3 ^K+r j 

3=1 j-1 

for K = 1, . . . , n - m + 1. If 


A = 


n-m+1 
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and Z is a column vector, then AZ is the desired out -of -plane projection. 

The hyperplane criteriondetermines whether the in -plane projection of Z is more than s stan- 
dard deviations from the simplex. The out -of -plane criterion tests whether any out -of -plane 
components of z are more than away from the simplex, where is the ’'out -of -plane 
factor’' specified in the control input. The value of is specified by the user and thus provides 
some flexibility in screening for alien objects. 
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